Welcome!
MBASQL.com
Have you had your coffee yet?
What are we covering?
1. Introduction: A brief introduction to the tutors, tools and the course
2. Writing your first query: A first take on SQL
3. Where and Operands: Learn how to filter data
4. Aggregate functions: Learn how to transform data
5. Group By: How we can aggregate on different cuts of data
6. Order By: Sorting data
7. Case Statements: Creating different results depending on certain GVs
8. Like: Compare data to a text pattern
9. Having: Filter data after we have extracted it
10. Joins: Extract data from multiple tables
11. Nested Queries: Querying a query
12. Dates: Dealing with dates
20 minutes
20 minutes
30 minutes
30 minutes
20 minutes
30 minutes
40 minutes
15 minutes
15 minutes
30 minutes
30 minutes
30 minutes
How to get the most out of this course?
Principles: Use the content of this course to understand the fundamentals. They
will be useful in answering real life data problems you come across.
Repeat: Practice, Repeat and introduce your own variations. It is important to
make mistakes, you will learn through trying and discovering.
Be curious: The online store provides a lot of data that you can practice with.
You have lifetime access to this and it provides a safe place to practice!
Welcome!
MBASQL.com
Have you had your coffee yet?
Who we are?
Weldon W. Whitener, PhD Salem Marafi, MBA
Education
PhD Bioinformatics at Cambridge University
researching with super large datasets
Work
4 years at BCG, London advising FTSE 100
companies on projects including Big Data
SQL
Working with data has distinguished me my
peers and greatly helped my progression
Education
IESE MBA, 2014. MSc. Marketing, BSc. MIS,
Computer Sciences, and Management
Work
8 years retail banking. Interned at Facebook
and switched to tech after IESE with Amazon
SQL
I am regularly stood out for knowing how to
access, analyse and bring insights from data
Who is this course for?
MBA Students who are going to manage businesses in the future and
might be required to independently access and manipulate data.
Beginners with absolutely no background in programming but have
used applications like Microsoft Excel in the past.
Future managers who are curious and want to attain a general
understanding of how data is accessed and managed in practice.
What are we covering?
An introduction to Structured Query Language (or SQL) which is the
standard language for accessing and manipulating databases.
Applications of data in a real business setting. We will use a play shopping
site and take the role of a category manager to assist in our learning.
How are we going to structure our lessons?
Watch: Every lesson starts with an introduction to a new concept where we will
cover the theoretical aspects of the functionality.
Play: You will then put on the hat of a category manager and be invited to think
about how each concept can be used in a business setting.
Do: In each lesson we will answer a problem together. This problem will test and
re-affirm all the concepts we learned in the watch portion of the lesson.
Challenge: Once we have completed an example, you will be challenged with
a problem you must solve. We will then present the solution and discuss.
Watch
How does an e-commerce site work?
Database
Product info
Customer info
Etc.
Data stored in
tables with inherent
relationships.
SQL
Instructions to the
database to load
or extract data
Data
Information such
as dates, text,
numbers, etc.
The Web
Data is constantly
being created
and used online.
What’s a relational database?
1
1
Typically, spreadsheets hold
information in tables which can
lead to redundancies.
What’s a relational database?
Like Spreadsheets, databases
hold dates, text, numbers, etc.
in tables. However, they also
hold relationships between
tables
1
2
Typically, spreadsheets hold
information in tables which can
lead to redundancies.
1
2
Play
Let’s get set up!
Visit MBASQL.com and register
as a customer. This will give you
access to:
1
2
The Store: Big Duck Furniture
The SQL Playground
3
Lifetime access to new content
Let’s get set up!
When registering:
1
2
Choose a username you will remember
Enter your full name
3
Use an email address you have access to
4
Enter a password you will remember
Do
The playground is where you will write your SQL
The schema is where you can find out what’s in your database
The shop is our make-believe online furniture store
Challenge
As a customer, browse around the store make 5 different purchases
Writing your first query
It’s not only what you write, but how you write it!
MBASQL.com
Watch
SELECT is a keyword that instructs
the databasse to extract data.
Every query has the same basic structure
SELECT
SOME COLUMN,
SOME OTHER COLUMN
FROM
SOME TABLE
1
1
SELECT is a keyword that instructs
the databasse to extract data.
Every query has the same basic structure
SELECT
SOME COLUMN,
SOME OTHER COLUMN
FROM
SOME TABLE
After every SELECT keyword is a
list of columns you want to
extract from a database table.
Note: Every selected column
must be separated by a comma
1
2
1
2
SELECT is a keyword that instructs
the databasse to extract data.
Every query has the same basic structure
SELECT
SOME COLUMN,
SOME OTHER COLUMN
FROM
SOME TABLE
FROM is a keyword that tells the
database which table(s) to look
in for the columns we selected.
1
2
3
1
3
After every SELECT keyword is a
list of columns you want to
extract from a database table.
Note: Every selected column
must be separated by a comma
2
SELECT is a keyword that instructs
the databasse to extract data.
Every query has the same basic structure
SELECT
SOME COLUMN,
SOME OTHER COLUMN
FROM
SOME TABLE
After every FROM keyword is a list
of tables you want to extract
from in your database.
1
2
3
4
1
4
FROM is a keyword that tells the
database which table(s) to look
in for the columns we selected.
3
After every SELECT keyword is a
list of columns you want to
extract from a database table.
Note: Every selected column
must be separated by a comma
2
Play
Take a tour of the Big Duck Furniture
store. We will be looking at how to
get product information.
Can you think of where and how
some of the information is stored?
If you were a category manager for
this store, what information would
you want to report on?
We now know that information is stored in tables
Do
It is common practice for
organizations to have a place that
lists out what columns are in each
table.
You will be able to quickly gather
where data sits and the type of data
that each column holds.
The most challenging part of all
data extraction is understanding
where the data sits, and what the
data represents. This takes practice.
The Schema lists out what tables we have access to
Using our schema, we found that there is a
table named shop_product that has a text
column name.
With this information we can write our first
basic query in the playground.
Write a query to list all of the products in the store
SELECT
name
FROM
shop_product
Challenge
Write a query that shows the name and price of all products in the store.
Using our schema, we see that the table
shop_product has both a text column for
name and price.
With this information we can quickly get the
solution to our first challenge.
Challenge Solution
SELECT
name,
price
FROM
shop_product
Filters and Operands
We do not always want all the data in a table
MBASQL.com
Watch
We can filter our columns like we would in Excel
SELECT
SOME COLUMN
FROM
SOME TABLE
WHERE
SOME CONDITION
AND ANOTHER CONDITION
AND ANOTHER CONDITION
WHERE is a keyword that tells the
database to impose conditions
on the data that will filter the
output of the query. This is like
EXCEL’s filter.
1
1
Every query has the same basic structure
SELECT
SOME COLUMN
FROM
SOME TABLE
WHERE
SOME CONDITION
AND ANOTHER CONDITION
AND ANOTHER CONDITION
We can introduce as many
conditions as we like using the
AND keyword. Note: there are
no commas here.
1
2
1
2
WHERE is a keyword that tells the
database to impose conditions
on the data that will filter the
output of the query. This is like
EXCEL’s filter.
ARITHMETIC: Used for computations on the output
COMPARISON: Used for comparison between two objects
LOGICAL: Used for evaluating if a condition is true or false
There are three types of operands
+ - * /
%
Five key arithmetic operands
+
ADDITION
Adds values on either side of the operator
10 + 20 will give 30
Arithmetic: Addition
-
Subtraction
Subtracts right number from left number
10 - 20 will give -10
Arithmetic: Subtraction
*
Multiplication
Multiplies values on either side of the operator
10 * 20 will give 200
Arithmetic: Multiplication
/
Division
Divides right number into the left number
20 / 10 will give 2
Arithmetic: Division
%
Modulus
Divides the left value by the right value and
returns the remainder
20 % 10 will give 0
20 % 11 will give 9
Arithmetic: Modulus
= != > <
Four key comparison operators
=
Equals
Checks if the left side equals the right side
10 = 10 will return True
10 = 20 will return False
Comparison: Equals
!=
Does not Equal
Checks if the left does not equal the right side
10 != 10 will return False
10 != 20 will return True
Comparison: Does not Equal
>
Greater than
Checks if the left side is great than the right side
10 > 20 will return False
20 > 10 will return True
Comparison: Greater than
<
Less Than
Checks if the left side is less than the right side
10 < 20 will return True
20 < 10 will return False
Comparison: Less than
AND OR
BETWEEN
IN NOT
Five key logical operators
AND
AND
Checks that all conditions are met
10 = 20 AND 10 = 10 will return False
Logical: AND
OR
OR
Checks that at least one condition is met
10 = 20 OR 10 = 10 will return True
Logical: OR
BETWEEN
Between
Searches values within a range
10 BETWEEN 0 AND 30 will return True
Logical: BETWEEN
IN
IN
Searches values within a list
10 IN (0,1,2,3) will return False
10 IN (0,10,2,3) will return True
Logical: IN
NOT
NOT
Negates a logical statement
10 NOT IN (0,1,2,3) will return True
10 NOT IN (0,10,2,3) will return False
Logical: NOT
Play
Given our new tools, lets look
around the Big Duck Furniture Store
for some anomalies.
How many can you find? Pictured
here are examples of problems
you’d likely encounter in the “real”
world.
We now know that we can filter different columns
1
2
3
1
2
3
Missing category images
Duplicate products
Super high price points
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that product prices are
stored in our shop_product table.
Let us look at the product with a super high price point
We can use the same query we used to get
all products, with a WHERE keyword to filter
only for products with a price greater than
€1000.
Write a query to list products whose price is > 1000
SELECT
name,
price
FROM
shop_product
WHERE
price > 1000
Challenge
Write a query that shows products with prices between €100 and €300.
We can use the same query we used to get all
products, with a WHERE keyword to filter only for
products with a price greater than 100 AND with
a price less than 300.
Challenge Solution
SELECT
name,
price
FROM
shop_product
WHERE
price > 100
AND price < 300
Challenge Solution: Alternative
SELECT
name,
price
FROM
shop_product
WHERE
price BETWEEN 100 AND 300
We can use the same query we used to get
all products, with a WHERE keyword to filter
only for products with a price BETWEEN 100
AND 300.
NOTE: Between is INCLUSIVE
Bonus
Challenge
Write a query that shows products without an image.
This challenge demonstrates why it is important
to understand the types of data in our tables.
Photos are stored as strings. If an image does not
have a photo, then it is stored as an empty string.
To filter for “no photo” we would have to look at
all products with an empty string in the photo
column.
Challenge Solution
SELECT
name,
photo
FROM
shop_product
WHERE
photo = “”
Aggregate Functions and Aliases
We can compute and aggregate data
MBASQL.com
Watch
Functions transforms the data of a given column
SELECT
SOME COLUMN,
FUNCTION(SOME COLUMN) AS ALIAS
FROM
SOME TABLE
WHERE
SOME CONDITION
AND SOME OTHER CONDITION
Functions transform data in
a column. They live in the
SELECT portion of a query
1
1
Alias allow us to rename different parts of our query
SELECT
SOME COLUMN,
FUNCTION(SOME COLUMN) AS ALIAS
FROM
SOME TABLE
WHERE
SOME CONDITION
AND SOME OTHER CONDITION
Functions transform data in
a column. They live in the
SELECT portion of a query
1
1
2
Aliases allow us to rename
different parts of our query
to make them clearer
2
These are some of the most common aggregate functions
SUM
()
COUNT
()
MIN
()
MAX
()
AVG
()
SUM
SUM
Returns the sum or total of a given column
E.g. SUM(quantity)
will return the total quantities sold
SUM() adds up data in a column
COUNT
COUNT
Returns the total number of values in a column
E.g. COUNT(id)
will return the number of products
COUNT() returns the number of rows or values in a column
MIN
MIN
Returns the minimum value of a column
E.g. MIN(price)
will return the lowest price of all products
MIN() returns the lowest value of a column
MAX
MAX
Returns the maximum value of a column
E.g. MAX (price)
will return the highest price of all products
MAX() returns the highest value of a column
AVG
AVG
Returns the average value of a column
E.g. AVG(price)
will return the average price of all products
AVG() returns the average value of a column
Play
Lets take a look through Big Duck Furniture’s website
through the eyes of a category manager.
If you were managing a category, what kind of
information would you be interested in at an
aggregate level?
Can you think of suggestions other than the three we
listed below?
What information is only interesting when aggregated?
1
2
3
Total sales by product category
Total traffic to your category
Average order quantity
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that product prices are
stored in our shop_product table.
Find the average price of all your products
We use the aggregate function AVG() on the
column price from out shop_product table. This will
output the average price of all products in the table.
Write a query to get the average of all products
SELECT
AVG(price) AS “Average Price”
FROM
shop_product
Challenge
What is the average price of all products if we exclude the expensive lamp?
We use the aggregate function AVG() on the
column price from out shop_product table.
This will output the average price of all products in
the table.
We do this only for products less than 9,999.99 by
applying a filter in our WHERE clause.
Challenge Solution
SELECT
AVG(price) AS “Average Price”
FROM
shop_product
WHERE
price < 9999.99
Bonus
Challenge
How many people have seen our products?
We know that our views are stored in
shop_impression. From our schema we see that
for every record, there is a view ID column. These
are impressions that we can count using the
COUNT() function.
Challenge Solution
SELECT
COUNT(id)
FROM
shop_impression
Double Bonus
Challenge
How many impressions have we seen excluding anonymous users?
We know that our views are stored in
shop_impression. If we select everything in the
table we will find that some user ID’s are stored
as “None”. This value is known as a NULL value.
One way of addressing this problem is to count
the ID’s that are greater than 0.
Challenge Solution
SELECT
COUNT(id)
FROM
shop_impression
WHERE
user_id > 0
Another solution is to select all rows where the
user_id is not empty. An empty cell is denoted as
a NULL cell which is different than a blank cell “”.
Filtering for NULL values is also special. We use
the keyword IS in combination with NOT to
negate the condition.
Challenge Solution: Alternative
SELECT
COUNT(id)
FROM
shop_impression
WHERE
user_id IS NOT NULL
Group By
♪ Come together, right now, over data! ♪
MBASQL.com
Watch
We need to group columns when using aggregate functions
SELECT
SOME COLUMN_1,
FUNCTION(SOME COLUMN_2)
FROM
SOME TABLE
WHERE
SOME CONDITION
AND SOME OTHER CONDITION
GROUP BY SOME COLUMN_1
GROUP BY allows us to group results
by the columns we specify.
Note (1): We always group by every
column we are not aggregating by,
in this case COLUMN_1.
Note (2): We never use aliases or
aggregate functions here.
1
1
We need to group columns when using aggregate functions
SELECT
SOME COLUMN_1,
FUNCTION(SOME COLUMN_2)
FROM
SOME TABLE
WHERE
SOME CONDITION
AND SOME OTHER CONDITION
GROUP BY SOME COLUMN_1
If we eliminate the GROUP BY
keyword, then the data will not
aggregate properly and you will
receive an error.
Curious? Try it out.
2
2
We need to group columns when using aggregate functions
SELECT
SOME COLUMN_1,
FUNCTION(SOME COLUMN_2)
FROM
SOME TABLE
WHERE
SOME CONDITION
AND SOME OTHER CONDITION
GROUP BY SOME COLUMN_1
If we remove the aggregates and
keep the GROUP BY keyword, then
the data will return unique rows.
This is the equivalent to using the
keyword DISTINCT in our SELECT
statement.
3
3
We need to group columns when using aggregate functions
SELECT
DISTINCT SOME COLUMN_1
FROM
SOME TABLE
WHERE
SOME CONDITION
AND SOME OTHER CONDITION
GROUP BY SOME COLUMN_1
We can use the keyword DISTINCT to
return only unique records. This is
important if our output has duplicates
that we do not want.
4
4
Play
Again, lets take a look through Big Duck Furniture’s
website as a category manager.
If you were managing a category, what kind of
information would you be interested in at an
aggregate level?
And this time, think about where you would want
more granular information than just the count of
products on the website or average price?
How can we get more granular aggregated data?
1
2
3
Average order size per customer
Number of products per category
Maximum priced product per category
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that orders are stored in
the shop_orderitem table at a
product level.
Write a query to get the total orders by product
We use the COUNT() aggregate function and the
product_id column to accomplish our goal.
We GROUP BY the product_id column. Note that we
do not have any alias after the column name.
Write a query to get the total orders by product
SELECT
product_id AS “Product”,
COUNT(id) AS “Total Orders”
FROM
shop_orderitem
GROUP BY
product_id
Challenge
How many products do we have in each category?
We use the aggregate function COUNT() on the
column id from our shop_product table to count
unique products.
We GROUP BY our category column so that we can
split the output on a category level.
Challenge Solution
SELECT
category_id AS “Category”,
COUNT(id) AS “Products”
FROM
shop_product
GROUP BY
category_id
Bonus
Challenge
What products are the most people looking at?
We know that our views are stored in the
shop_impression table. From our schema we also
know that for every record we have an ID that we
can use as a counter.
Therefore to count impressions we use the COUNT()
function on the ID column and GROUP BY our cut of
interest: product_id.
Challenge Solution
SELECT
product_id AS “Product”,
COUNT(id) AS “Impressions”
FROM
shop_impression
GROUP BY
product_id
Order By and Limit
Sorting your data can come in handy
MBASQL.com
Watch
We can sort our columns like we would in Excel
SELECT
SOME COLUMN_1,
FUNCTION(SOME COLUMN_2)
FROM SOME TABLE
WHERE SOME CONDITION
GROUP BY SOME COLUMN_1
ORDER BY
SOME COLUMN_1 DESC,
ANOTHER COLUMN ASC
ORDER BY allows us sort our results by
one or more columns in a ascending or
descending order. We use ASC for
ascending, and DESC fo rdescending
order.
1
1
We can sort our columns like we would in Excel
SELECT
SOME COLUMN_1,
FUNCTION(SOME COLUMN_2)
FROM SOME TABLE
WHERE SOME CONDITION
GROUP BY SOME COLUMN_1
ORDER BY
SOME COLUMN_1 DESC,
ANOTHER COLUMN ASC
LIMIT 10
ORDER BY allows us sort our results by
one or more columns in a ascending or
descending order. We use ASC for
ascending, and DESC fo rdescending
order.
1
1
2
We can limit the number of rows in the
results by using the keyword LIMIT
2
Play
Lets again take a look through Big Duck
Furniture through the eyes of a category
manager.
Sorting can come in handy when you want to
see the top or bottom rows. Other than the
three cases below, what else would you be
interested in seeing?
When would we need to order and limit our data?
1
2
3
Top 10 customers
Top 10 products solds
Top 10 products bought
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that products are stored in
the shop_product table.
What are the top 10 highest priced items in your catalogue?
We use the shop_product table to extract the
names and prices of our products. We sort the
results in DESCending order and LIMIT the results
to 10 rows.
What are the top 10 highest priced items in your catalogue?
SELECT
name,
price
FROM
shop_product
ORDER BY
price DESC
LIMIT 10
Challenge
What are the top 3 categories ranked by average product price?
We use the aggregate function AVG() on the
column price from our shop_product table to
get the average price. We cut this on
category_id by using our GROUP BY clause.
Finally, we order by the average price of each
category using AVG(price)
Challenge Solution
SELECT
category_id,
AVG(price) AS “Avg. Price”
FROM
shop_product
GROUP BY
category_id
ORDER BY
AVG(price) DESC
Case
You go here, you go there: bucketing data on conditions is super useful!
MBASQL.com
Watch
Bucketing data using case statements is extremely useful
SELECT SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END AS ALIAS
FROM SOME TABLE
WHERE SOME CONDITION
GROUP BY SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END
CASE statements create a new column
1
1
Bucketing data using case statements is extremely useful
SELECT SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END AS ALIAS
FROM SOME TABLE
WHERE SOME CONDITION
GROUP BY SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END
CASE statements create a new column
1
1
2
Values are determined by conditions
2
Bucketing data using case statements is extremely useful
SELECT SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END AS ALIAS
FROM SOME TABLE
WHERE SOME CONDITION
GROUP BY SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END
CASE statements create a new column
1
1
2
3
Values are determined by conditions
2
If no condition is met, use a default value
3
Bucketing data using case statements is extremely useful
SELECT SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END AS ALIAS
FROM SOME TABLE
WHERE SOME CONDITION
GROUP BY SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END
CASE statements create a new column
1
1
2
3
4
Values are determined by conditions
2
If no condition is met, use a default value
3
Every CASE statement ends with END
4
Bucketing data using case statements is extremely useful
SELECT SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END AS ALIAS
FROM SOME TABLE
WHERE SOME CONDITION
GROUP BY SOME COLUMN,
CASE
WHEN CONDITION THEN VALUE
ELSE DEFAULT VALUE
END
CASE statements create a new column
1
1
2
3
4
Values are determined by conditions
2
If no condition is met, use a default value
3
Every CASE statement ends with END
4
If your CASE statement does not
aggregate any data, then you must
include it in GROUP BY statement
5
5
Play
Take a look through Big Duck Furniture through
the eyes of a category manager.
Bucketing is a useful tool when assessing
segments. What kind of segmentation can you
think of?
How would you use CASE to segment data?
1
2
3
Customer segments by purchase volume
Product segments by price
Product segment by traffic
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that products are stored in
the shop_product table.
Segment your products by price : Cheap, Expensive, Average
Segment your products by price : Cheap, Expensive, Average
SELECT name,
CASE
WHEN price < 100 THEN ‘cheap’
WHEN price > 200 THEN ‘expensive’
ELSE ‘average’
END AS “price_range”
FROM shop_product
GROUP BY name,
CASE
WHEN price < 100 THEN ‘cheap’
WHEN price > 200 THEN ‘expensive’
ELSE ‘average’
END
Bucketing is a useful tool when assessing
segments.
In this example, we want to bucket our
products according to their price range.
If the product is priced less than 100, we
classify it as “cheap”.
If the price is more than 200, then it is
expensive”.
Otherwise, the price is just “average”.
Challenge
Segment products purchases number: If more than 10 have sold, it is popular.
We use the aggregate function SUM() inside
our CASE statement to help us accomplish
our goal.
Note: We do not use aggregate functions in
our group by, and so we do not include the
CASE in the GROUP BY part of the query.
Challenge Solution
SELECT
product_id,
CASE
WHEN SUM(quantity) > 10 THEN ‘Popular’
ELSE 'Regular'
END AS 'Popularity’
FROM
shop_orderitem
GROUP BY
product_id
Like
Sort of equal, but not really.
MBASQL.com
Watch
Like helps you locate similar data without much extra code
SELECT
SOME COLUMN
FROM
SOME TABLE
WHERE
SOME COLUMN LIKE SOME PATTERN
The keyword LIKE allows you to
compare a column value to a
text pattern.
1
1
Like helps you locate similar data without much extra code
SELECT
name
FROM
shop_product
WHERE
name LIKE “Chair”
This will match any product that
has a name LIKE the text string
“Chair”.
It will not match “Red Chair”.
It will not match “Chairiot”.
It will not match “Chairs”.
This means without anything, LIKE
is like an equals sign.
2
2
Like helps you locate similar data without much extra code
SELECT
name
FROM
shop_product
WHERE
name LIKE “%Chair”
If we introduce a % at the
beginning of our pattern, we
can now match anything that
ends with the word chair.
It will match “Red Chair”.
It will not match “Chairiot”.
It will not match “Chairs”.
3
3
Like helps you locate similar data without much extra code
SELECT
name
FROM
shop_product
WHERE
name LIKE “Chair%”
If we introduce a % at the end of
our pattern, we can then match
anything that starts with the
word chair.
It will not match “Red Chair”.
It will match “Chairiot”.
It will match “Chairs”.
4
4
Like helps you locate similar data without much extra code
SELECT
name
FROM
shop_product
WHERE
name LIKE “%Chair%”
If we introduce a % at both ends
of our pattern, we can then
match anything that contains
the word chair.
It will match “Red Chair”.
It will match “Chairiot”.
It will match “Chairs”.
5
5
Play
Take a look through Big Duck Furniture through the eyes of a
category manager.
Sometimes you want to find the presence of a pattern. This is
especially useful when diving deep into catalogue issues.
Can you name a few cases where LIKE could come in useful?
When would it be useful to search for a pattern?
1
2
3
Find products with keywords in descriptions
Find products with keywords in titles
Find the presence of keywords in bullet points
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that descriptions are
stored in the shop_product table.
What products are likely to be made of wood?
We use the shop_product table to extract the
names and descriptions of our products where
the descriptions have any match to the
pattern “wood”.
What products are likely to be made of wood?
SELECT
name,
description
FROM
shop_product
WHERE
description LIKE “%wood%”
Challenge
What products are likely to have both metal and wood?
To complete this challenge we use our
keyword LIKE twice to compare the same
column to two patterns.
By using AND we denote that both patterns
must match.
Challenge Solution
SELECT
name,
description
FROM
shop_product
WHERE
description LIKE “%metal%”
AND description LIKE “%wood%”
Bonus
Challenge
Find any product with metal mentioned in the bullet points
We can compare the bullet point columns with
the word pattern LIKE “%metal%”.
Note that we use OR to denote that we only
require one of the statements.
Challenge Solution
SELECT
name,
bullet_point_a,
bullet_point_b,
bullet_point_c,
bullet_point_d,
bullet_point_e
FROM shop_product
WHERE
bullet_point_a LIKE “%metal%”
OR bullet_point_b LIKE “%metal%”
OR bullet_point_c LIKE “%metal%”
OR bullet_point_d LIKE “%metal%”
OR bullet_point_e LIKE “%metal%”
Having
We can filter the output of our query even aggregates!
MBASQL.com
Watch
SELECT
SOME COLUMN_1,
FUNCTION(SOME COLUMN_2)
FROM SOME TABLE
WHERE SOME CONDITION
GROUP BY SOME COLUMN_1
HAVING
FUNCTION(COLUMN) CONDITION
AND FUNCTION(COLUMN) CONDITION
1
1
The HAVING keyword is used to
introduce conditions on the output
of a query.
This is useful for constraining
aggregate information by some
condition.
Having allows you to process outputted data from a query
Play
Take a look through Big Duck Furniture through
the eyes of a category manager.
We previously thought of some scenarios where
aggregates would be useful.
Here is a reminder of some:
What information is only interesting when aggregated?
1
2
3
Total sales by product category
Total traffic to your category
Average order quantity
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that product orders are
stored in our shop_orderitem table.
Find products that have sold more than 2 units
SELECT
product_id,
SUM(quantity) AS “Total Units”
FROM shop_order_item
GROUP BY product_id
HAVING
SUM(quantity) > 2
1
1
The HAVING keyword is used here to
filter the output for total quantities of
more than 2.
Find products that have sold more than 2 units
2
2
We select the total units sold for each
product.
However, since we cannot use
aggregates in our WHERE clause, we
resort to our alternative keyword:
HAVING.
Challenge
Look for customers who have browsed more than 5 products
SELECT
user_id,
COUNT(id) AS “Total Impressions”
FROM shop_impression
GROUP BY user_id
HAVING
COUNT(id) > 5
1
Challenge Solution
2
We count unique impressions using COUNT()
and GROUP BY user_id from our
shop_impression table.
To impose our constraint, we use HAVING with
the condition that COUNT of impression ID’s is
greater than 5.
Combining Data
We are often interested in data across more than one table
MBASQL.com
Watch
We will explore common ways of joining tables
LEFT JOIN
INNER JOIN
RIGHT JOIN
Left joins add data from right set to left set where there is overlap
LEFT JOIN
B
Assume table A contains product information
Assume table B contains product sales information.
SKU Name Description
1 Lamp Night lamp
2 Chair Wing Chair
3 Bed King Size Bed
SKU Units Sold Price
1 5 100
2 10 200
4 20 10
A
Left joins add data from right set to left set where there is overlap
A left join will keep all information from table A and join B where the SKUs are the same.
If a SKU exists only in table A, the columns for sales and price will be empty.
Conversely, if a SKU is in table B but not A, that row will be omitted.
A left join B
SKU Name Description Units Sold Price
1 Lamp Night lamp 5 100
2 Chair Wing Chair 10 200
3 Bed King Size Bed
Left joins add data from right set to left set where there is overlap
1
You will notice we introduced a prefix to
our column names. This is to let the
database know which table we are
referring to when we denote the column.
This removes any confusion if two tables
have columns with the same name.
1
Left joins add data from right set to left set where there is overlap
SELECT
A.SKU,
A.Name,
A.Description,
B.Units,
B.Price
FROM PRODUCTS AS A
LEFT JOIN ORDERS AS B
ON A.SKU = B.SKU
SELECT
A.SKU,
A.Name,
A.Description,
B.Units,
B.Price
FROM PRODUCTS AS A
LEFT JOIN ORDERS AS B
ON A.SKU = B.SKU
1
You will notice we introduced a prefix to
our column names. This is to let the
database know which table we are
referring to when we denote the column.
This removes any confusion if two tables
have columns with the same name.
1
2
2
FROM indicates the first table we are
going to select from. You will notice that
we use an ALIAS for our table. This makes
our query more readable. We use the
same alias in the SELECT part of our
statement.
Left joins add data from right set to left set where there is overlap
1
You will notice we introduced a prefix to
our column names. This is to let the
database know which table we are
referring to when we denote the column.
This removes any confusion if two tables
have columns with the same name.
1
2
2
FROM indicates the first table we are
going to select from. You will notice that
we use an ALIAS for our table. This makes
our query more readable. We use the
same alias in the SELECT part of our
statement.
3
LEFT JOIN lets the database know we are
combining data between tables ON a
given condition.
3
Left joins add data from right set to left set where there is overlap
SELECT
A.SKU,
A.Name,
A.Description,
B.Units,
B.Price
FROM PRODUCTS AS A
LEFT JOIN ORDERS AS B
ON A.SKU = B.SKU
B
SKU Name Description
1 Lamp Night lamp
2 Chair Wing Chair
3 Bed King Size Bed
SKU Units Price
1 5 100
2 10 200
4 20 10
A
SKU Name Description Units Price
1 Lamp Night lamp 5 100
2 Chair Wing Chair 10 200
3 Bed King Size Bed
AB
Left joins add data from right set to left set where there is overlap
SELECT
A.SKU,
A.Name,
A.Description,
B.Units,
B.Price
FROM PRODUCTS AS A
LEFT JOIN ORDERS AS B
ON A.SKU = B.SKU
Right joins are essentially just the opposite of left joins
RIGHT JOIN
B
Assume table A contains product information
Assume table B contains product sales information.
SKU Name Description
1 Lamp Night lamp
2 Chair Wing Chair
3 Bed King Size Bed
SKU Units Sold Price
1 5 100
2 10 200
4 20 10
A
Right joins are essentially just the opposite of left joins
A right joined will keep all information in A and B where the SKUs are the same.
If a SKU exists only in table B, the columns for product name & description will be empty.
Conversely, if a SKU is in table A but not B, that row will be omitted.
A right join B
SKU Name Description Units Sold Price
1 Lamp Night lamp 5 100
2 Chair Wing Chair 10 200
4 20 10
Right joins are essentially just the opposite of left joins
SELECT
A.SKU,
A.Name,
A.Description,
B.Units,
B.Price
FROM PRODUCTS AS A
RIGHT JOIN ORDERS AS B
ON A.SKU = B.SKU
1
RIGHTJOIN lets the database know we are
combining data between tables ON a
given condition.
1
Right joins are essentially just the opposite of left joins
B
SKU Name Description
1 Lamp Night lamp
2 Chair Wing Chair
3 Bed King Size Bed
SKU Units Price
1 5 100
2 10 200
4 20 10
A
SKU Name Description Units Price
1 Lamp Night lamp 5 100
2 Chair Wing Chair 10 200
4 20 10
AB
Right joins are essentially just the opposite of left joins
SELECT
A.SKU,
A.Name,
A.Description,
B.Units,
B.Price
FROM PRODUCTS AS A
RIGHT JOIN ORDERS AS B
ON A.SKU = B.SKU
Inner joins pull data only when there is overlap in both tables
INNER JOIN
B
Assume table A contains product information
Assume table B contains product sales information.
SKU Name Description
1 Lamp Night lamp
2 Chair Wing Chair
3 Bed King Size Bed
SKU Units Sold Price
1 5 100
2 10 200
4 20 10
A
Inner joins pull data only when there is overlap in both tables
An inner join will keep all information in A and B where the SKUs are the same.
All other rows are omitted.
A inner join B
SKU Name Description Units Sold Price
1 Lamp Night lamp 5 100
2 Chair Wing Chair 10 200
Inner joins pull data only when there is overlap in both tables
SELECT
A.SKU,
A.Name,
A.Description,
B.Units,
B.Price
FROM PRODUCTS AS A
INNER JOIN ORDERS AS B
ON A.SKU = B.SKU
1
1
Inner joins pull data only when there is overlap in both tables
INNER JOIN lets the database know we are
combining data between tables ON a
given condition.
B
SKU Name Description
1 Lamp Night lamp
2 Chair Wing Chair
3 Bed King Size Bed
SKU Units Price
1 5 100
2 10 200
4 20 10
A
SKU Name Description Units Price
1 Lamp Night lamp 5 100
2 Chair Wing Chair 10 200
AB
Inner joins pull data only when there is overlap in both tables
SELECT
A.SKU,
A.Name,
A.Description,
B.Units,
B.Price
FROM PRODUCTS AS A
INNER JOIN ORDERS AS B
ON A.SKU = B.SKU
Play
Take a look through Big Duck Furniture through
the eyes of a category manager.
Now that you can relate data tables together,
how would this be applicable to your business?
Here are some thoughts:
When & where might we want to join data together?
1
2
3
We now can get all product attributes
We can look at sales by product category
We can look at users who never viewed a product
Do
List all product names and their total order quantities
Remember, if you are not sure
where to look start by going to your
schema.
We know that orders are stored in
the shop_orderitem table. Product
details are stored in the
shop_product table.
SELECT
product.id,
product.name,
SUM(orders.quantity)
FROM
shop_product AS product
INNER JOIN
shop_orderitem AS orders
ON product.id = orders.product_id
GROUP BY
product.id,
product.name
We use an INNER JOIN between the product table
shop_product and orders table shop_orderitem,
matching the ID in the product table to its
equivalent product_id in the orders table.
Note: Since we are aggregating, we must use
GROUP BY and refer to the columns with prefixes.
List all product names and their total order quantities
Challenge
Calculate each category’s total revenue showing just the category id
SELECT
products.category_id,
SUM(orders.quantity) * products.price
FROM shop_product AS products
INNER JOIN shop_orderitem AS orders
ON products.id = orders.product_id
GROUP BY products.category_id
We use an INNER JOIN between the product table
shop_product and orders table shop_orderitem,
matching the ID in the product table to its
equivalent product_id in the orders table.
Note: Since we are aggregating, we must use
GROUP BY and refer to the columns with prefixes.
Challenge Solution
Bonus
Challenge
Can you pull all category names and their respective order quantities?
SELECT
product.category_id,
category.name,
SUM(orders.quantity)
FROM
shop_product AS product
INNER JOIN shop_orderitem AS orders
ON product.id = orders.product_id
LEFT JOIN shop_category AS category
ON product.category_id = category.id
GROUP BY
product.category_id
This query involves two joins.
An inner join between the product and orders
tables to get the sales data.
A left join with the category table to get the
category names.
Note: Joins work in sequence so the output of the
INNER JOIN is the table you then LEFT JOIN onto
Challenge Solution
1
2
1
2
Nested Queries
We can use the output of one query as input for another!
MBASQL.com
Watch
Introduction to Nested Queries
The output of any query you run is effectively one of two
things:
1
2
A single column of data that we treat as a list
A table, just like any other table stored in the database
Introduction to Nested Queries
We will use this concept to do one of two things:
1
2
SELECT data FROM the output of a query
Filter the results of a query WHERE the value of a column
is IN a list from another query
Nested Queries can act as Tables
SELECT
Alias.Column A
FROM
(
SELECT
Column A,
Column B
FROM
Table
WHERE
Condition
) Alias
WHERE
Condition
Instead of using a table in the
database, we enclose a
subquery with brackets.
1
1
The output of this subquery will
be columns A and B. We can
later impose conditions on them.
2
We must give the subquery an alias.
This will become the table name we
use in front of our columns in the
SELECT statement.
3
2
3
Nested Queries can act as Filters
SELECT
Column A
FROM
Table_1
WHERE
Column A IN
(
SELECT
Column X
FROM
Table_2
)
AND Condition
1
We use IN as a logical keyword
to ask the database to match a
column, in the case Column A,
to a list of values.
The list of values is from our
subquery. In this case our
subquery results in the values of
Column X.
1
Play
Take a look through Big Duck Furniture through the eyes of a
category manager.
What are some ways you would be able to use this new tool?
What kind of information can we gather with this tool?
1
2
3
We can now find sales for the top 10 most viewed products
We can find the users who have never viewed or bought
We can find sales of products with missing images
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that orders are stored in
the shop_orderitem table.
How would you get the maximum average order quantity?
How would you get the maximum average order quantity?
SELECT
orders.product_id,
AVG(orders.quantity) AS Quantity
FROM
shop_orderitem AS orders
GROUP BY
orders.product_id
To start, we write out a nested query to get the average order quantity per product.
Now how could we go about getting the maximum average order quantity?
By using a nested query!
How would you get the maximum average order quantity?
SELECT
MAX(AOQ.Quantity)
FROM
(
SELECT
orders.product_id,
AVG(orders.quantity) AS Quantity
FROM
shop_orderitem AS orders
GROUP BY
orders.product_id
) AS AOQ
We use the query in our FROM
clause, and treat it as a table.
In this instance, we gave the
table name AOQ.
This is why in our SELECT
statement, we refer to the table
AOQ.
1
Remember, if you are not sure
where to look start by going to your
schema.
We know that orders are stored in
the shop_orderitem table. Product
details are stored in the
shop_product table.
Get product details of all products with an order
Get product details of all products with an order
SELECT orders.product_id
FROM shop_orderitem AS orders
GROUP BY orders.product_id
We use the query below to get all products that have ever had an order by ID.
What if we wanted to get the product details of all products that ever had an order?
Get product details of all products with an order
SELECT
product.name,
product.price,
product.manufacturer
FROM
shop_product AS product
WHERE
product.id IN
(
SELECT orders.product_id
FROM shop_orderitem AS orders
GROUP BY orders.product_id
)
We use the query in our WHERE
clause, and treat it as a list.
We only include product.id’s
that are in the product_id
column of our subquery.
Note: the subquery must only
ever output one column.
1
Challenge
Find all products that have ever had an impression, but never ordered
SELECT
product_id AS product
FROM
shop_impression
WHERE
product_id NOT IN
(
SELECT product_id
FROM shop_orderitem
)
GROUP BY product_id
Our subquery first looks at all products with an order.
We use the results to exclude those IDs from our
main query.
Our main query looks at all products with an
impression except for the ones in our subquery.
Challenge Solution
Bonus
Challenge
Can you join on all the product details to the query in the last challenge?
SELECT products.name,
COUNT(impressions.id)
FROM
shop_impression AS impressions
LEFT JOIN shop_product AS products
ON impressions.product_id = products.id
WHERE product_id NOT IN
(
SELECT product_id
FROM shop_orderitem
)
GROUP BY products.id, products.name
Using the same query, we introduce a LEFT JOIN
in our statement.
Notice how we are using ID in the GROUP BY but
not in SELECT. This ensures we are grouping by a
unique column.
Challenge Solution
Dealing with Dates
We usually want to report on different time periods
MBASQL.com
Watch
There are two circumstances we find ourselves using dates
There is a date column that we want to transform for our output.
Example: reporting on monthly sales, but our data is stored daily.
2
1
We want to compare a date column to a date stored as a string.
Example: We want all sales after “23/03/2016”.
Before we begin, let’s quickly revisit functions
SELECT
SOME COLUMN,
FUNCTION(SOME COLUMN) AS ALIAS
FROM
SOME TABLE
WHERE
SOME CONDITION
AND SOME OTHER CONDITION
Functions transform data in
a column. They live in the
SELECT portion of a query
1
1
2
Functions transform data in
a column. They live in the
SELECT portion of a query
2
To transform a date column, we use DATE_FORMAT function
SELECT
SOME COLUMN,
DATE_FORMAT (DATE COLUMN, “FORMAT”) AS ALIAS
FROM
SOME TABLE
WHERE
SOME CONDITION
AND SOME OTHER CONDITION
1
DATE_FORMAT() is a function
that takes 2 inputs. The first is
the column name, and the
second is a text string that
tells the database how to
format the date.
1
Dates are formatted in a number of ways with DATE_FORMAT
DATE_FORMAT (DATE COLUMN, “FORMAT”)
Format Description
%a Abbreviated weekday name (Sun-Sat)
%b Abbreviated month name (Jan-Dec)
%D Day of month with English suffix (0th, 1
st
)
%d Day of month, numeric (00-31)
%M Month name (January-December)
%m Month, numeric (00-12)
%u Week (00-53) where Monday is the first day of week
%W Weekday name (Sunday-Saturday)
%Y Year, numeric, four digits
%y Year, numeric, two digits
Some Examples for 25
th
March 2017
DATE_FORMAT (DATE COLUMN, “FORMAT”)
DATE_FORMAT (DATE COLUMN, “%Y-%m-%d”)
2017-03-25
Some Examples for 25
th
March 2017
DATE_FORMAT (DATE COLUMN, “FORMAT”)
DATE_FORMAT (DATE COLUMN, “%m”)
03
DATE_FORMAT (DATE COLUMN, “%Y-%m-%d”)
2017-03-25
Some Examples for 25
th
March 2017
DATE_FORMAT (DATE COLUMN, “FORMAT”)
DATE_FORMAT (DATE COLUMN, “%m-%W”)
03-Sat
DATE_FORMAT (DATE COLUMN, “%m”)
03
DATE_FORMAT (DATE COLUMN, “%Y-%m-%d”)
2017-03-25
To use a date in our filters, we use STR_TO_DATE
SELECT
SOME COLUMN
FROM
SOME TABLE
WHERE
SOME CONDITION
AND DATE COLUMN > STR_TO_DATE(“DATE”,”FORMAT”)
STR_TO_DATE() is a function that
takes 2 inputs. The first is the date
we want to express, and the
second is a text string of how we
are formatting the date.
1
1
How can we format a string to a date?
STR_TO_DATE(“DATE”, “FORMAT”)
Format Description
%a
Abbreviated weekday name (Sun-Sat)
%b
Abbreviated month name (Jan-Dec)
%D
Day of month with English suffix (0th, 1
st
)
%d
Day of month, numeric (00-31)
%M
Month name (January-December)
%m
Month, numeric (00-12)
%u
Week (00-53) where Monday is the first day of week
%W
Weekday name (Sunday-Saturday)
%Y
Year, numeric, four digits
%y
Year, numeric, two digits
Some Examples of formatting date strings
STR_TO_DATE(“DATE”, “FORMAT”)
STR_TO_DATE(“2017-03-25”, “%Y-%m-%d”)
2017-03-25
Some Examples of formatting date strings
STR_TO_DATE(“DATE”, “FORMAT”)
STR_TO_DATE(“2017-03-25”, “%Y-%m-%d”)
2017-03-25
STR_TO_DATE(“2017-03-25”, “%Y”)
ERROR
Some Examples of formatting date strings
STR_TO_DATE(“DATE”, “FORMAT”)
STR_TO_DATE(“2017-03-25”, “%Y-%m-%d”)
2017-03-25
STR_TO_DATE(“2017-03-25”, “%Y”)
ERROR
STR_TO_DATE(“2017”, “%Y”)
2017
Play
Take a look through Big Duck Furniture through the
eyes of a category manager.
If you were managing a category, how does time
play a role in managing your category?
Here are some ideas:
How do dates inform you decisions as Category Manager?
1
2
Total sales by week
Selection creation by date
3
Traffic by day of week
Do
Remember, if you are not sure
where to look start by going to your
schema.
We know that product orders prices
are stored in our shop_orderitem
table.
Write a query to get the units sold by day of the week
We use the aggregate function SUM () on the
column quantity from out shop_orderitem table. This
will output the total units sold of all products in the
table.
We use the column created and transform this date
column into the day of the week denoted by %W.
Write a query to get the units sold by day of the week
SELECT
DATE_FORMAT(created,"%W") AS "Day”,
SUM(quantity) AS "Total Units”
FROM
shop_orderitem
GROUP BY
DATE_FORMAT(created,"%W")
Challenge
Segment traffic by day of the week, what is the most popular day?
We use the aggregate function COUNT () on the
column ID from out shop_impressions table. This will
output the impressions by day of the week.
We use the column created and transform this date
column into the day of the week denoted by %W.
Challenge Solution
SELECT
DATE_FORMAT(created,"%W") AS "Day”,
COUNT(id) AS "Total Impressions”
FROM
shop_impression
GROUP BY
DATE_FORMAT(created,"%W")
Bonus
Challenge
Show product, total quantity, and total impressions by month
SELECT
products.name,
DATE_FORMAT(orders.created,"%m"),
SUM(orders.quantity),
COUNT(impressions.id)
FROM shop_product AS products
INNER JOIN shop_orderitem AS orders
ON orders.product_id = products.id
LEFT JOIN shop_impression AS impressions
ON impressions.product_id = orders.product_id
AND
DATE_FORMAT(orders.created,”%Y%m") = DATE_FORMAT(impressions.created,”%Y%m")
GROUP BY
products.name,
DATE_FORMAT(orders.created,"%m")
We select the necessary columns from our
products, orders, and impressions tables as we
did previously using the DATE_FORMAT() function
to segment by month.
We first INNER JOIN on product id to include only
products that have sold. Then LEFT JOIN on
product ID, but also on the same date format.
This ensures that we are counting impressions
and orders that happened in the same month.
Challenge Solution
Thanks!
MBASQL.com
Please let us know what you thought at http://www.mbasql.com/feedback/
hello@mbasql.com